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ABSTRACT 

Genomic imprinting is an epigenetic mechanism by 
wliicli alleles of some specific genes are expressed 
in a parent-of-origin manner. It has been observed in 
mammals and marsupials, but not in birds. Until 
now, only a few genes orthologous to mammalian 
imprinted ones have been analyzed in chicken and 
did not demonstrate any evidence of imprinting in 
this species. However, several published observa- 
tions such as imprinted-like QTL in poultry or recip- 
rocal effects keep the question open. Our main 
objective was thus to screen the entire chicken 
genome for parental-allele-specific differential 
expression on whole embryonic transcriptomes, 
using high-throughput sequencing. To identify 
the parental origin of each observed haplotype, 
two chicken experimental populations were used, 
as inbred and as genetically distant as possible. 
Two families were produced from two reciprocal 
crosses. Transcripts from 20 embryos were 
sequenced using NGS technology, producing 
-^200 Gb of sequences. This allowed the detection 
of 79 potentially imprinted SNPs, through an 
analysis method that we validated by detecting 
imprinting from mouse data already published. 



However, out of 23 candidates tested by 
pyrosequencing, none could be confirmed. These 
results come together, without a priori, with 
previous statements and phylogenetic consider- 
ations assessing the absence of genomic imprinting 
in chicken. 

INTRODUCTION 

Parental genomic imprinting is a process that leads to the 
differential expression of alleles depending on their 
parental origin. This phenomenon has been described in 
eutherian mammals, marsupials, plants and insects, but 
has never been observed in birds (1,2). 

Even if not the only one (3), the main theory proposed to 
explain this phenomenon is the parental conflict hypothesis, 
also known as the kinship theory. Briefly, imprinting is 
leading to an imbalance of parental allele contribution, 
and the parental conflict hypothesis argues that the genes 
controlling the allocation of resources to the embryo should 
be particularly affected, with the maternal genome restrict- 
ing the use of resources, allowing to preserve the mother 
and future progeny, and the paternal genome favoring 
growth of his offspring (4-8). Readers are referred to (9) 
for an in-depth discussion of the kinship theory. According 
to this theory, imprinting should be restricted to organisms 
in which maternal resources directly affect the embryo. 
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It would therefore be unlikely to observe this phenomenon 
in oviparous animals (10), where the amount of resources is 
limited by the egg size, controlled by the mother. Genomic 
imprinting evolved both in angiospemi plants and placental 
mammals (11) but is also seen in insects, and it has even 
been shown that it arose several times during mammals' 
evolution due to different selective pressure at several loci 
(12). Regardless of the kinship theory, the appearance of 
genomic imprinting in nonmammalian animals is thus not 
implausible. Genomic imprinting was studied in birds, but 
until now, only the expression of genes known to be im- 
printed in mammals has been analyzed in chickens, most of 
them showing biallelic expression and thus, leading to the 
conclusion of the lack of imprinting in this species: several 
studies have been conducted on IGF2, known to be pater- 
nally expressed in mice and humans (13), all of them except 
one (14) leading to the conclusion that it was biallelically 
expressed (15-19). In this genomic region, the H19 imprint- 
ing center, controlling an imprinted cluster including IGF2, 
is shown to be absent from the chicken genome (19). Other 
genes known to be imprinted in mammals have been shown 
to be biallelically expressed in chicken, such as ASCL2/ 
MASH2, M6PR/IGF2R, DLKl and UBE3A (19-21). 
However, parent-of-origin QTLs have been detected in 
chicken and quail (22-24). Whereas some of them may 
finally not be considered as relevant to genomic imprinting 
due to linkage disequilibrium or bias generated by the ex- 
perimental design (25,26), others appeared to be consistent, 
when using appropriate experimental animal population 
designs and methodology (25,27). Moreover, reciprocal 
effects, of great importance in poultry production, may 
be partly explained by imprinting. Together with several 
molecular features, such as the absence of DNMT3L (key 
factor of the establishment of DNA methylation) in the 
chicken genome (28) and the possible existence of imprint- 
ing mechanisms other than DNA methylation (29,30), these 
contradictory observations encourage to definitely answer 
the question (31). 

Advances in sequencing technologies now allow to study 
genomic imprinting through whole transcriptome 
sequencing (32-34), avoiding any a priori choice of target 
genomic regions, such as the chicken homologous of im- 
printed regions in mammals. We therefore chose Next 
Generation Sequencing (NGS) technology to analyze im- 
printing in chicken, without any preconception on whether 
it exists. Chicken is a model species for numerous studies in 
birds, and the availability of inbred lines is a great advantage 
for our purpose: we set up an experimental design consisting 
in two families from two reciprocal crosses between chicken 
lines, chosen so as to allow the identification of the parental 
origin of each allele in embryonic offspring. We sequenced 
whole embryos transcriptomes, and the relative expression 
level of each parental allele was compared, for the inform- 
ative loci, between both cross ways. 

MATERIALS AND METHODS 

Ethics statement 

Chickens were bred at INRA, UE1295 Pole 
d'Experimentation Avicole de Tours, F-37380 Nouzilly, 



in accordance with European Union Guidehnes for 
animal care. The farm is registered by the Ministry of 
Agriculture with the license number C37-175-1 for 
animal experimentation. The experiment was realized 
under authorization 37-002 dehvered to D. Gourichon. 

Chicken lines and crosses 

To maximize the probability of identifying the parental 
origin of an allele in Fl individuals from two reciprocal 
crosses, we selected 2 chicken lines among 10 
(Supplementary Table SI), available at INRA experimen- 
tal farm (UE1295 PEAT, F-37380 Nouzilly, France). 
Genotyping was performed on 3-12 individuals from 
each population with an lllumina 57K Single Nucleotide 
Polymorphism (SNP) array [57 636 SNPs (35)]. SNPs were 
filtered (call freq > 0.9, GenTrainScore > 0.5, Minor Allele 
Frequency (MAF)>0.01) and allele frequencies within 
each fine calculated with phnk, option — freq (36). 
Genetic similarities between fines (homozygozities for a 
given line) were then calculated (Figure 1) from the 
52 189 filtered SNPs. The most genetically distant pair of 
fines together with the highest homozygosity each were 
selected, i.e. the Line 6 (37) and the Line R~ (38). 
A male and a female from each line were reciprocally 
crossed, and 24 Fl embryos, 12 from each cross (6 
males and 6 females per cross), were harvested from the 
same batch at embryonic day 4.5 (stage 26). Genomic 
DNA and total RNA were concurrently extracted from 
the same samples of crushed whole embryos without 
extraembryonic membranes (AllPrep DNA/RNA Mini 
Kit, Qiagen). RNA quality was measured by a 
BioAnalyzer (Agilent); all samples had a RNA Integrity 
Number > 9.9. 

Parental genomic DNA was extracted from blood 
samples through a high-salt extraction method (39). 

Sequencing 
Chicken data 

RNA sequencing. Libraries with a mean insert size of 
200 bp were prepared foUowing lllumina instruction 
for RNA-Seq analysis, by selecting polyA+ fragments 
(TruSeq RNA Sample Prep Kit) from each sample. 
Samples were tagged to allow subsequent identification, 
amplified by polymerase chain reaction (PCR) and 
quantified by quantitative PCR (Agilent QPCR Library 
Quantification Kit) and then sequenced (paired ends, 
100 bp) in triplicate on an lllumina HiSeq 2000 sequencer 
(lllumina, TruSeq PE Cluster Kit v3, cBot and TruSeq 
SBS Kit v3) by randomizing their position in six different 
sequencing lanes. 

DNA sequencing. Parental DNA was sequenced on four 
lanes of lllumina HiSeq 2000 to aUow the detection of 
SNPs discriminating the two fines. Library preparation 
(mean insert size of 300 bp), DNA quantification and 
sequencing (paired ends, 100 bp) were performed accord- 
ing to the manufacturer instructions (TruSeq DNA 
Sample Prep Kit lllumina, Agilent QPCR Library 
Quantification Kit, TruSeq PE Cluster Kit v3 cBot 
TruSeq SBS Kit v3). 
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Figure 1. Experimental design. (A) Heatmap of genetic similarities between lines. Colors correspond to the inbred level of tested individuals (red is 
inbred). Lines 6 and (red arrows) were chosen to be as inbred and as genetically distant as possible. (B) Reciprocal cross between the two selected 
lines. In case of genomic imprinting, embryos are preferentially expressing one of their parents' allele (this figure shows a case of paternally expressed 
gene). Detection of such event is possible with polymorphisms differentiating the lines. 



Mouse data 

RNA-seq sequences were collected from a previous study 
(40). They correspond to liver mRNA from 18 Fl males 
and females DBA/2J and C57BL/6J reciprocally crossed 
mice (two pools of nine mice). These crosses are termed as 
DXB and BXD in the article. 

Computational analysis 

When not specified, analyses were made with homemade 
Perl, Python and R scripts. 

Genomic sequences analyses 

Reads from DNA sequencing of parents were ahgned on 
the last assembly of chicken genome (Galgal4) using the 
bwa program version 0.6.1, option aln (41). 

Identification of SNP discriminating parents from both 
lines 

Allelic counts for each base position on the genome were 
determined. These counts allowed selecting only loci 
discriminating parents, i.e. positions where both parents 
from each line were homozygous for distinct alleles. 

The functional consequence of each SNP in each tran- 
script was predicted using the Ensembl Variant Effect 
Predictor version 71 (42). 

Transcriptome characterization 

Chicken data. Tophat 2.0.5 software (43) was used to 
align transcript sequences from the Fl embryos on 
Galgal4 chicken assembly. Reads were first mapped 
allowing multiple ahgnments, with the intron length par- 
ameter set between 3 and 25 OOObp to limit the number of 
false positive while discovering most junctions, the 
deletion length was limited to I instead of 3, and the 



mate inner distance in our case corresponded to 200. 
The option read-reahgn-edit-dist was set to 0 to map 
every read in all the Tophat mapping steps to get the 
best possible ahgnment. Finally, a micro exon search 
was applied. Other alignment options were set as default. 

To remove potential PCR reads duplicates from the 
analysis, the program samtools rmdup (41) was run on 
each individual after alignment. Afterward, reads with 
potential multiple locations were discarded. 

Samtools flagstat command (41) was used to generate 
statistics on ahgned reads, notably the total number of 
reads mapped after applying our criteria. 

To characterize the general level of expression in all the 
embryos, cufflinks software 2.0.0 was used on Galgal4.72 
GTF file to quantify expression on known genes (44). 

Mouse data. Sequencing reads were aligned with Tophat 
2.0.5 on an artificial reference genome where all SNPs 
known between DBA/2J and C57BL/6J (45) were con- 
verted to 'N' to prevent the preferential mapping of the 
reference allele. In order not to affect to a large extend 
the alignment efficiency, no penalty was set for reads 
mapping on N in this new reference. A maximum of 
three mismatches was allowed for a read to be kept. 
Finally, a filter was applied to keep only reads uniquely 
mapped to the reference genome. 

Identification of SNP with biased expression 

Transcripts alignments were merged by parental cross 
for further analysis. Allelic counts allowed to select 
SNPs represented with at least 10 reads in each cross, 
with an inverted allelic ratio between both crosses (Fold 
Change >2.5). 

To test the expression bias, a Fisher Exact Test was 
made on every transcribed loci and the significance 
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threshold was fixed at False Discovery Rate (FDR) < 0.05 
using the R package 'q value' (46). 

Validation assays 
Pyrosequencing 

Twenty-three markers were tested on a Qiagen PyroMark 
Q24 sequencer. Primers were designed using PyroMark 
Assay Design software (Supplementary Table S2). PCR 
samples were produced using PyroMark PCR Kit 
(Qiagen). Runs were analyzed by PyroMark Q24 1.0.10 
software with default analysis parameters. 

Droplet Digital PCR 

In total, three markers were tested on Bio-Rad QXIOO^^ 
Droplet Digital PCR (ddPCR) System machine. Primers and 
probes were ordered at Sigma (Supplementary Table S3). 

RESULTS 

Constitution of an informative device 

To detect genomic imprinting for a given gene, it is 
required that we are able to discriminate the parental 
origin of its expressed allele in the Fl embryos. The prin- 
cipal purpose of the Unes' choice was thus to maximize the 
chances to identify from which parent the expressed allele 
had been transmitted to the embryos. To select parents as 
inbred (to reduce intra-population polymorphism) and as 
genetically distant (to maximize inter-population poly- 
morphism) as possible, 3-12 animals from 10 chicken 
populations were genotyped on an Illumina 57K SNP 
array (35). Line R~ and Line 6 were the two that fitted 
the best these criteria (Figure lA) and were reciprocally 
crossed: one male from Line 6 was mated with a female 
from Line R~ and one R~ male was crossed with one 
female from Line 6 (Figure IB). We selected 12 embryos 
from each reciprocal cross, with balanced ratio of six 
males and six females from each cross. 

To detect SNPs discriminating the parents from both 
lines, genomic DNA from all four parents was sequenced. 
A total of 897 648 484 paired-ends reads (100 bp) were 
produced from an Illumina HiSeq 2000 sequencer and 
813407 461 (91%) aligned to the Galgal4 reference 
genome, which was thus covered at 98%, with a mean 
depth of 20x. 

We found 2 298 622 SNPs discriminating the parents 
from both fines on genomic DNA, amounting to 
2.1 ± 0.7 SNP/kb of sequence. When considering the num- 
ber of annotated genes with at least one discriminating 
SNP, we found that 12 303 genes were represented. It 
corresponds to 72% of chicken annotated genes 
(Supplementary Table S4). Among the observed 
discriminating SNPs, 54.9% were localized in intergenic 
regions (Supplementary Figure SI), and 27.4% of the SNP 
detected in the coding region were nonsynonymous, which 
is simfiar to results previously obtained in chicken (47,48). 

To have a better idea of their density in the potentially 
expressed regions, SNPs within exons were considered, 
that is 55 016 SNPs (i.e. 2% of detected SNPs). It corres- 
ponds to a density of 1.3 SNP/kb of coding regions, which 
is sufficient to have a reasonable screening of the exome. 



Nevertheless, we used the total set of available 
discriminating SNPs in the study, to include nonannotated 
regions in the analysis. 

The fraction of these SNPs that wiU be observed in the 
RNA-Seq data represents the number of opportunities to 
link the allelic imbalance of transcripts detected in 
embryos with their parental origin. 

Transcriptome characterization 

To link the parental origin of the alleles to their expression 
pattern in embryos, we performed RNA-Seq on 12 
embryos from each cross. We obtained reliable sequencing 
results from 20 embryos (SRA accession number 
SRP033603), II from Cross I (5 females and 6 males) 
and 9 from Cross 2 (4 females and 5 males); the other 4 
were discarded because of sequencing tags problems. 

From those 20 embryos, 1.33 x lO' and 6.07 x 10*^ se- 
quences were produced from Cross 1 and Cross 2 individ- 
uals, respectively (200 Gb of sequence data altogether), 
corresponding on average to 121.1 million reads and 
67.4 million reads for each individual in Cross 1 and 
Cross 2, respectively. 

After alignment to the chicken assembly, removing of 
PCR duplicates and aUowing only one possible hit, 74% 
of reads were mapped to the reference genome, which cor- 
responds to 42% coverage of the genome (39% for Cross 
1 and 35% for Cross 2, Supplementary Figure S2). We set 
a minimal depth of 10 reads to consider a region as 
covered, to avoid background of low expressed regions. 

Regarding the general expression in chicken embryo, a 
minimal FPKM of 1 was expected (one fragment per 
kilobase per mifiion reads) to caU a gene as 'expressed'. 
Among 17 108 genes already described in chicken (avail- 
able from galgal4.72 GTF file), 11416 genes, i.e. 67% of 
them, had a FPKM superior or equal to 1 in our data set. 

Filters — bias in expression 

RNA sequences were filtered to detect allelic imbalance 
expression at loci discriminating parents from both lines. 
Around 300 million bases were counted as expressed 
in both crosses, meaning that at least one read was 
observed at this position in each cross. To avoid bias in 
expression finked to a lack of depth, a threshold of 10 
reads had to be reached in each cross (which corresponds 
to a minimal depth of 20 in total), leading to 123 532053 
positions, i.e. 40% of base positions expressed in both 
crosses. 

Among these loci, 5 388 148 were polymorphic, and 
therefore potentially informative to test allelic imbalance. 
When crossing these expressed positions to discriminating 
SNPs (201209 positions), we found that 7731 of 1 1 416 
expressed genes (68%) were informative, i.e. containing 
at least one SNP discriminating parents of embryos in 
both crosses (Supplementary Table S4). A Fisher Exact 
Test was appfied on these 5 million SNPs to test if the 
expression profile was different between Cross I and 
Cross 2. At a 5% FDR threshold, we observed 117 740 
loci for which expression profiles were significantly differ- 
ent between Cross 1 and Cross 2, including positions with 
an allelic bias in only one of the crosses. Only positions 
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Figure 2. Genomic contexts of the candidate loci on the chicken 
genome. 



where the allehc ratio was inverted between both crosses 
were potentially candidates, i.e. the minor allele in one 
cross is overrepresented in the reciprocal cross. Because 
the study was made on the whole embryo (a mixture of 
different tissues) a complete extinction of one allele was 
not necessarily expected, and we set a minimal fold change 
of 2.5 for this fiher. A total of 4904 SNPs met these 
criteria. They could either result from polymorphisms in 
parental sequences or from an effective bias in expression. 
A further filtering was done removing the SNPs that, even 
if significant, were heterozygous in the parents and there- 
fore not informative to trace back the allele parental 
origin. For that reason, hnking these candidate SNPs to 
the parental origin of the allele was only possible at loci 
discriminating parents from both crosses. To do so, we 
crossed filtered data with the 2 298 622 discriminating 
SNPs detected in parallel on parents' genomic DNA. We 
checked that the detected allelic imbalance was similar for 
most of the pooled embryos in each cross, to avoid any 
bias due to a high monoallelic expression in few embryos 
(never observed in our candidates). At the end, we ended 
up with 79 potentially imprinted SNPs. These SNPs were 
mostly located in introns (51% of them); the others were 
upstream or downstream of annotated genes (31%), exons 
(6%), 3'-UTR (4%) or intergenic regions (8%) (Figure 2, 
Table 1). This distribution is similar to that of 
discriminating SNPs in sequences declared as expressed 
in both crosses, where 1 1 % of these SNPs are located in 
known annotated transcripts (exons and UTR regions). 
This attribution was based on the known chicken annota- 
tion (Ensembl release 72, http://www.ensembl.org/Gallus_ 
gallus/lnfo/Index). We considered expressed intergenic 
regions as unannotated regions. 

Validation assays 

We tested 23 candidate SNPs to confirm the lUumina 
HiSeq results with pyrosequencing as an independent val- 
idation method (49,50). A total of 16 embryos (8 per cross) 



were tested. Tested candidates were chosen to get a panel 
of different situations highlighted after sequences analysis: 
intronic, exonic, regulatory regions and unannotated 
regions were looked at. Furthermore, candidates with dif- 
ferent depth profiles were tested. To select the positions to 
be tested, we preferentially chose loci where SNPs sur- 
rounding the candidate position had an allelic profile simi- 
larly unbalanced: taking into account these additional 
positions — even if not passing our filters due to a lack of 
depth or an allelic ratio <2.5 — ensure to avoid artifactual 
SNPs. 

First, to avoid any bias due to PCR amplification of an 
allele or to genomic duplication in one cross, we checked 
that the pattern obtained by pyrosequencing on genomic 
DNA from the embryos truly corresponds to a heterozy- 
gous profile. Then we quantified the allelic ratio observed 
on cDNA. 

Unfortunately, while confirming the heterozygous 
status of embryos in each case, we could not observe on 
cDNA any allelic imbalance corresponding to an imprint- 
ing profile (Figure 3). 

Three markers were also tested by droplet digital PCR 
as another complementary quantitative approach. This 
confirmed the pyrosequencing results, further invalidating 
our HiSeq results (Figure 3). 

Method validation with RNA sequencing data from mouse 

As parental genomic imprinting is thought to be absent in 
chicken, we used as 'positive control' the mouse data set 
produced by Lagarrigue and collaborators (40) to detect 
imprinted genes in mouse liver, to validate the approach 
we proposed on chicken. In the mouse study, a reciprocal 
cross was made between DBA/2J (further called D) and 
C57BL/6J (further called B) mouse strains and RNA was 
sequenced from liver samples of 18 Fl mice. 

Out of 285 166 513 reads generated by RNA-seq in both 
mouse crosses, 145 412 592 reads (51%) were mapped 
uniquely on an artificial reference genome where SNPs 
between both mouse strains were replaced by 'N'. It cor- 
responds to 74 733 541 reads for one cross (DXB) and 
70679 051 reads for the reciprocal cross (BXD). 

These sequences were then filtered for bias in expression 
following the same pipeline as for chicken embryos. At a 
5% FDR, we found 20 candidate SNPs with an inverted 
ratio between both crosses (minimum Fold Change of 2.5, 
Table 2). Of these, five are located in three genes known to 
be imprinted in mouse: H13 (33), Pegl3 (51), both 
detected in the original study, and Snurf (52), not 
revealed by the first analysis. A fourth gene, Sgce, also 
detected by Lagarrigue et al., was identified when less 
stringent conditions on sequencing depth threshold were 
applied (i.e. when accepting nine reads in each cross). 

These results confirm the efficiency of our pipehne to 
detect genomic imprinting from RNA-Seq data that 
aUowed not only confirming three genes identified by 
Lagarrigue et al. but also demonstrating the presence of 
another imprinted gene, Snurf, which had not been 
detected in their study. The newly discovered loci have 
to be further analyzed. 



Nucleic Acids Research, 2014. Vol. 42, No. 6 3773 



1 11 



i 

ill' 



3l 



ii 



11 



ill 



ill 



ii 



gS3 



ill 



ii 



si 



if 



1" 



s s s - s g a 



s K a a 



s s ~ 



» a « m 



3 S 



II 



n 



a ft s a 



S!PI§oSS.r,pi 



a r., 



o, a, S 



s a s s 



" <" K 



3 ° S g 



II 



PI S 



s s s 



aassss^aftsSasaag 



33333 



33 



33 



S3 



ii 



3S3 



S3S 



ii 



ii 



Hi 



s a 



ii 



9 a 



s s 



iill 



ilili 



I! 



ill 



ill 



ii 



577^ Nucleic Acids Research, 2014, Vol. 42, No. 6 



s s 



C 5 
12 g 



3 § 



Nucleic Acids Research, 2014, Vol. 42, No. 6 3775 




Cross 1 



Cross 2 



B 


















































i = 


















V 




r 




























i 

Cross 1 : 8 embryos 



Sample 

L_ 



i 

Cross 2 : 8 embryos 



Figure 3. Comparison of results on a candidate gene obtained througli HiSeq, droplet digital PCR and pyrosequencing. Colors are differentiating 
both alleles in sequencing results and ddPCR. (A) Hiseq counts of both alleles at the candidate locus (A allele in green, G allele in blue). 
(B) Pyrosequencing results from one sample of each cross (chosen as representative of average results). Analyses were made on the reverse 
strand. Allele A is overexpressed in both crosses (left: Cross 1 results from one embryo's cDNA, right: Cross 2 results from one embryo's 
cDNA).The highlighted peaks correspond to the SNP where the relative proportion is quantified. (C) Droplet digital PCR on the candidate 
locus from eight samples of each cross (A allele in green, G allele in blue). 



DISCUSSION 

This work represents, to our knowledge, the first tran- 
scriptome-wide investigation of genomic imprinting in 
birds. Previous studies in chicken were made on targeted 
genes, known to be imprinted in mammals, and they 
concluded to the nonexistence of this phenomenon in 
birds (15-18,20,53). To get free of any a priori, investiga- 
tions should not be restricted to targeted loci but consider 
every single chicken gene. RNA sequencing technology 
is an approach meeting these expectations (54,55). 



Transcriptome sequencing has already been used with 
this intention in other species, especially in mouse 
(32,33,54). It not only allowed the confirmation of most 
mammahan imprinted genes but also the discovery of 
putative new ones. 

We thus set up an experimental cross designed for the 
search of parental genomic imprinting in chicken, by using 
RNA-Seq on whole embryos from reciprocal crosses 
between chicken lines, as inbred and genetically distant 
as possible. Our results identified 79 potentially imprinted 
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Table 2. Candidate position obtained from the mouse data set 
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SNPs, mainly located in introns from distinct genes, 
not shown to be imprinted when analyzing their coding 
portion. It has been demonstrated that imprinted 
retrogenes could be hosted in introns of biallelically ex- 
pressed genes in mouse (56), leading us to consider these 
loci as possibly interesting. But in the presence of genes 
truly imprinted, one could expect a higher number of can- 
didate SNPs detected in known coding regions, and these 
results, with a large proportion of SNPs outside the coding 
regions, strengthened the need of experimental vahdations 
of these results. 

These positions out of the coding sequence raise the 
question of the functional significance of the transcribed 
portion of the genome. There is a strong debate in the 
hterature about the proportion of the genome that is 
'functional' (57-60), following the ENCODE resuhs, 
which assigned biochemical functions to 80.4% of the 
human genome, for which the cumulative coverage is 
62.1% for processed transcripts (61). In our analysis, 
only ~40% of the genome is considered to be transcribed, 
with a minimal coverage of 10 reads in the pool of all 
20 embryos data at a single-base scale. This discrepancy 
may be partly explained by our selection of only 
polyadenylated mRNA. Nevertheless, the transcribed 
portion of the genome far exceeds its exonic subset, a 
more and more admitted characteristic of the genome 
(62,63). 

Before vahdation, we ensured that candidate SNPs were 
not declared biased because of only one highly expressed 
embryo (Supplementary Figure S3). None of the candi- 
date loci tested through vahdation experiments was con- 
firmed as imprinted. These candidates being probably 
false positives, we thus validated our methodology by 
reanalyzing a mouse data set, as a 'positive control', 



where imprinting should be detected. By confirming the 
three previously identified imprinted genes [H13, Pegl3 
and Sgce (40)], and by discovering one more in these 
data, SnurJ\ previously known to be imprinted (52), we 
confirmed the reliability of the method. These four genes 
found imprinted in mouse are a severe underestimate of 
the total set of genes known to be imprinted in this species. 
This low number of genes is discussed in the original study 
by Lagarrigue and coworkers (40), notably highlighting 
the fact that only a fraction of imprinted genes are 
expressed in a given tissue at a precise developmental 
stage. Their study was the first, to our knowledge, to 
examine the parent-of-origin specific gene expression in 
adult liver in mouse, and there may be few imprinted 
genes in this tissue. The amount of data is probably the 
main reason for this low number of detected genes. Our 
chicken data set contains 10 times more sequences aligned 
and analyzed than the mouse data set. Considering that 
the mouse has a sHghtly higher number of annotated 
genes, expression can be better estimated in our chicken 
RNA-Seq sequences. With a higher sequencing depth, 
more genes could potentially have been detected. 
Nevertheless, two analysis methods from the same data 
set lead to the detection of imprinted genes in mouse. 
The differences between the results obtained here (four 
genes) and in the previous analysis (three genes) from 
the same data have two main origins: first, the minimal 
depth was originally set to 10 for each allele, which dis- 
carded Smirf, almost totally imprinted, as a candidate 
locus. Moreover, the original study did not take into 
account the noncoding region (explaining the discovery 
of only exonic imprinted SNPs in the first analysis). The 
biological reality of the newly discovered loci has to be 
determined, but this analysis confirmed that imprinting, 
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when existing, is detected with our method. This valid- 
ation is reinforced by the fact that we analyzed 200 Gb 
of chicken sequence (versus 3 Gb for the mouse study), 
leading to the assumption that we should have observed 
signiticant results if imprinting did exist in chicken. On the 
other hand, the use of whole embryos in our study (versus 
only liver samples in mouse) might have affected the final 
results. But our choice was, for the same amount of data, 
to reach a sufficient depth to rehably discover multitissue 
parental imprinting, without having to a priori select one 
single tissue. The choice of the embryo was determined by 
the high number of imprinted genes already known to be 
imprinted in the embryo, compared with single adult 
tissues (64). Moreover, others studies conducted on the 
whole embryo allowed the detection of imprinting in 
mouse (65). If imprinting does exist in chicken at this de- 
velopmental stage, but is observed only in one tissue, or if 
a locus is paternally expressed in some tissues but mater- 
nally expressed in others (66), our device might have not 
allowed to detect such cases. While such profiles were 
already observed, most genes are imprinted in multiple 
tissue types (67). Nevertheless, it would be interesting to 
confirm the absence of imprinting in chicken by studying 
other tissues, extraembryonic membranes in particular: 
most of the known imprinted genes in mammals are ex- 
pressed in the placenta (68,69), notably regulating nutri- 
tional resources to the embryo during development 
(70,71). 

Another question concerning our model's choice was 
the embryonic stage of development. Stage 26 is a devel- 
opmental stage from which a sufficient quantity of RNA 
can be extracted for RNA-Seq and validation experi- 
ments, and the expression of imprinted genes has 
already been described at the equivalent stage in the 
mouse embryo (64). It would also have been interesting 
to check the imprinting status of genes expressed in adult 
brain: by applying the kinship theory to a social context, 
the existence of imprinting would be possible in a species 
showing maternal care, as previously proposed (72). 
Incidentally, several genes have been shown to be im- 
printed in the mouse brain, notably related to social 
behavior (73). But the best bird Hues to study would 
have been in this case from species with strong parental 
care behavior, and inbred chicken lines, despite their suit- 
ability for the RNA-Seq strategy used, are probably not 
the best model in this area. 

Besides practical reasons as availabiUty of inbred fines 
and of a reliable genome assembly, the choice of the 
chicken for this study has been dictated by several argu- 
ments: parent-of-origin QTLs have been detected in this 
species (27), and chicken is an example of farm animals 
used in crossbreeding, where genomic imprinting could be 
important for the utilization of reciprocal effects. It is also 
an important animal model in developmental biology and 
phylogenetics, and is used as a model for other avian 
species, especially in genetics and genomics (47). 

A last comment about the device can be made about the 
sequenced transcripts: we limited the screening to polyA 
RNA, and thus we could not conclude on the potential 
imprinted status of noncoding RNAs, some of which are 
also known to be imprinted in mammals (74). 



With a reliable analysis method but no evidence of im- 
printing in our chicken data set, our results seem to 
confirm the absence of imprinting in chicken, at least in 
the whole embryo at embryonic day 4.5. This analysis is a 
first step in the genome-wide search of genomic imprinting 
in birds. The absence of this phenomenon will be definitely 
confirmed — taking into account the difficulty of claiming 
the absence of such a multifarious mechanism — only when 
other studies will complete this first one: other transcribed 
parts of the genome (such as non-coding RNA), other 
tissues at other developmental stages must be analyzed, 
as well as other chicken lines and, most importantly, 
other bird species. But given previous work on several 
candidate genes (15-21), and absence of differential 
methylated regions according to the parental allele in the 
chicken genome (75), it gives a strong argument, through a 
genome-wide study, in agreement with previous studies. 
This absence would be in accordance with the most 
shared hypothesis, the kinship theory, which leads to the 
assumption that imprinting does not exist in oviparous 
species (10). Another theory concerning the evolution of 
genomic imprinting mechanisms is the 'host defence' hy- 
pothesis, assuming that imprinting appeared consequently 
to mechanisms aimed at silencing foreign DNA elements 
inserted into the genome, and evolved in parallel with 
accumulation of repeats in certain genomic regions 
(12,70,76,77). This is again not in favor of the existence 
of this phenomenon in chicken, a species with a low level 
of repeats and active transposable elements, compared 
with mammals (78). 

The absence of this phenomenon would lead to alterna- 
tive explanations of reciprocal effects. These effects repre- 
sent ~20% of the phenotypic variability of body weight in 
broiler and of egg viability in layers (79), and one cause 
could have been genomic imprinting. But other mechan- 
isms can lead to such effects, as direct maternal effects, 
sex-linked genes or mitochondrial DNA transmission. 
Regarding QTLs detected as showing parent-of-origin 
effects, most of them can result from biases due to the 
animal device, as previously pointed out (25,26,80). 
Rowe and coworkers, in a paper describing a maternally 
expressed QTL obtained in a pedigree avoiding such 
biases, comment their results by stating they 'have insuf- 
ficient evidence to confirm that these are truly imprinted 
effects'. Taking into account the historical discussion 
about the absence of genomic imprinting in chicken and 
our results, it is possible that no QTL truly showing 
parent-of-origin effect does exist in chicken. 

To understand why these false imprinted candidates 
were detected, we tried to investigate the possible biases 
and issues encountered during the analysis. This is not a 
unique situation in RNA sequencing data analysis, as 
some of the results from imprinting analysis through tran- 
scriptome sequencing had also been contested in mouse 
(81,82), highlighting many possible biases inherent to 
RNA sequencing data, from experiments to analyses. In 
particular, some alleles can be underrepresented through 
library preparation, several SNPs can appear to be artifac- 
tual and read aUgnment can be biased due to a better 
mapping of the reference allele (83). Because we were 
looking for inverted allelic ratios between reciprocal 
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crosses in our study, this last issue could not explain our 
results because regions where the reference allele was pref- 
erentially mapped were avoided. We considered that the 
possible problem of artifactual SNPs was also absent from 
our data because we crossed the SNPs observed from the 
RNA-Seq data with the polymorphisms detected between 
the parents by genomic sequencing. Furthermore, all 
the candidate SNPs analyzed through other methods, as 
Sanger sequencing (data not shown) or pyrosequencing, 
confirmed the heterozygous status of the embryo's DNA. 
The possibihty of underrepresentation of an allele due to 
the library preparation remains, even if less likely due to 
the use of reciprocal crosses: to generate such a bias, an 
allele should be 'randomly' retained in one cross, while the 



alternative allele should be 'randomly' kept in the other 
cross during the library preparations. 

Likewise, there is a possible dependence between nu- 
cleotide frequency at a SNP and its position on sequence 
reads (84). This bias can originate from the sequencing 
platform or from a positional bias of the reads across 
the transcripts. In any case, correcting this bias can 
improve expression estimates (85). All 79 positions were 
studied for this putative bias, but only four of them 
do have a median position among reads in the 20% 
extremities (Figure 4). The sequencing depth may be 
another cause of the median deviation from the theoretical 
value of 50 (average position on the read). While location 
of a SNP on a read may randomly vary, lower expression 
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level increases the probability of the median position 
of the SNP differing from this medium position. By 
removing candidate SNPs being in extremities of reads, 
we either remove low expressed loci or effectively biased 
loci. Excluding these four candidates is thus a conservative 
choice. But the unconfirmed allelic imbalance of the 75 left 
ones cannot be explained this way. Moreover, if we 
consider the proportion of forward and reverse reads for 
each candidate, it appears that no overrepresentation of 
one direction of sequencing was detected (no candidate 
SNP with a proportion of forward read <20% or 
>80%). This potential bias was thus not the source of 
false positive in our case. 

If the discovery of false-positive SNPs does not directly 
come from ahgnment or position bias, it can also be linked 
with analysis criteria. One of the main issues in such 
studies is the sequencing depth threshold to declare 
allehc imbalance. Studies made until now used different 
threshold in their analyses (32,33,86). This issue was 
underlined in many allelic imbalance studies using RNA 
sequences data, but until now, no consensus was made on 
a reliable threshold (32,86,87). We set ours to a minimum 
read depth of 10 in each cross, which corresponds to a 
minimal coverage of 20 reads for each locus considered 
in the analysis. This threshold was appropriate to high- 
light imprinted loci in the mouse data set. Thus, taking 
into account the lower sequencing depth in the mouse 
study compared with the chicken one, we should have 
detected imprinted loci in chicken if present: even a 
higher threshold would have brought to light such loci 
from our data set. 

We thus looked at the candidates with regard to the 
general distribution of sequencing depth. Most of them 
have a low expression profile (Figure 5), which could 
lead to false-positive candidates. We can raise the hypoth- 
esis that at some low sequencing depth, i.e. at low expres- 
sion levels, genes might express one allele preferentially, or 
that it is not technically possible to properly amphfy both 
alleles when the gene is expressed under a certain limit. 
Under these conditions, the principal null hypothesis used 
in this study could be modified: instead of a 50:50 ratio 
between both alleles in transcripts of heterozygous 
embryos, the null hypothesis would depend on the expres- 
sion rate of the genes. This hypothesis is in accordance 
with previous studies, highlighting the technical issues 
leading to allelic imbalance (86,88,89). 

By looking to possible biases in our analysis, we tried to 
understand why no genomic imprinting could be observed 
on the candidate SNPs that had been highlighted. This 
leads to several considerations: firstly, results' validation 
is required through a reliable quantitative method, as we 
did mainly by pyrosequencing; secondly, possible experi- 
mental and analytical biases should be considered in the 
final results interpretation and care should be taken 
regarding general sequencing depth, possible alignment 
biases and threshold choices, as previously underlined 
(82,90). 

This study was designed to detect parental genomic 
imprinting in birds, with chicken as a model species. Our 
results show that, at least at this developmental stage, 
there is no genomic region imprinted at the whole 




10 100 1000 10000 100000 

Sequencing Depth (log 10 scale) 



o 




y ■ II I 



10 100 1000 10000 100000 

Sequencing Depth (log 10 scale) 

Figure 5. Sequencing depth distribution for all SNPs (top), with classes 
containing candidate SNPs in gray, and for candidate SNPs (bottom), 
with tested SNPs in black. 



embryo level. This does not exclude specific events 
occurring in a small subset of tissues or at different 
developmental stages that would most probably not 
have been detectable in the combination of tissues used 
in this work. Nevertheless, this study proposes a perform- 
ing method to investigate the existence of genomic 
imprinting at a genome scale, without a priori choice of 
targets genes known to be imprinted in mammals. Our 
results support the fact that, while the evolution of im- 
printing has occurred in a convergent manner for several 
clades, from plants to mammals, it is probably missing in 
birds. 
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